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1. Introduction 

In Quantum Chromodynamics (QCD), the topological susceptibility (Xt) is the most essen- 
tial quantity to measure the topological charge fluctuation of the QCD vacuum, which plays an 
important role in breaking the Ua{1) symmetry. Theoretically, Xt is defined as 

X, = jd'x{p{x)p{Q)) = ^ (1.1) 

where ^ 

P{x) = y^£iivXa^[Fliv{x)Fxa{x)], Qt = Jd^xp{x), (1.2) 

Qt is the topological charge (which is an integer for QCD), p(x) is the topological charge density, 
and n is the space-time volume. In the chiral perturbation theory (ChPT), the quark mass depen- 
dence of Xt was derived at the tree level ^ in 1992, and it has been extended to the one-loop order 
recently [^. For Nf = 2, these ChPT formulas can be written as 

Xt/rriu = — -, — , (1.3) 
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where £ is the chiral condensate, F-,^ the pion decay constant, and Kj are related to low energy 
constants L,. A salient feature of Xt is that it is suppressed in the chiral limit (m„ — > 0) due to the 



internal quark loops. Most importantly, (1.3) and (1.4) provide a viable way to extract £ and other 
low energy constants. 

However, one cannot determine Qt reliably using the link variables, due to the rather strong 
fluctuations at short distances. Instead, we consider the Atiyah-Singer index theorem 

2f = =index(^), (1.5) 

where n± is the number of zero modes of the massless Dirac operator Si = y^ [d^ + igA^ ) with ± 
chirality. Thus, by couting the zero modes of the massless Dirac operator, we can determine Qt and 
Xt for an ensemble of gauge configurations. 

Recently, the topological susceptibility has been measured in unquenched lattice QCD with 
exact chiral symmetry, for Nf = 2 lattice QCD with overlap fermion in a fixed topology [Q], and 
Nf = 2+ \ lattice QCD with domain-wall fermion These results are in good agreement with 



the ChPT at the tree level (1.3) 



In this work, we measure the topological charge of the gauge configurations generated by 
lattice simulations of 2 flavors QCD on a 16^ x 32 lattice, with the Optimal Domain- Wall Fermion 
(ODWF) [|] at Ns = 16, and plaquette gauge action at j8 = 5.90. Mathematically, ODWF is a 
theoretical framework which can preserve the chiral symmetry optimally for any finite A^^. Thus 
the artifacts due to the chiral symmetry breaking with finite A^^ can be reduced to the minimum, 
especially in the chiral regime. The 4D effective operator of massless ODWF is 

D = mo[l + 75V(^H0], V(//w)= |~";,7^!:\ T,= \~^f\ (1.6) 
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which is exactly equal to the Zolotarev optimal rational approximation of the overlap Dirac oper- 
ator [0]. That is, Sopt{Hw) = H„Rz{H„), where Rz{H„) is the optimal rational approximation of 

We outline our scheme of projecting the low-lying eigenmodes of D as follows. Using [DD^, 75] 
0, and the representation 

D|0) =A(0)|0), A(0)=mo(l+^''■^), (1.7) 

we obtain 



s±\e)^ = p±H^Rz{H^)p±\e)± = ±co^e\e)^, P^ = _ii, (i.8) 



1±75 

2 

where \Q) = P^\6) + P-\d) = \Q)^ + Thus, we can perform the eigenmode projection on 
the operator S± instead of D. Moreover, \Q)^ are related to each other through the relation 

|0) = ^(755-^'-'^)|0)±, 0y^O,±7r,±27r,... (1.9) 
I sm t7 

Assuming the zero modes only residing in either + or — chirality, we project the low-lying modes 
of D as follows. First, we project the smallest eigenmodes of 5+. If D has zero modes with 
positive chirality, then the smallest eigenvalues of 5+ aie equal to —1, and the corresponding full 



eigenvectors can be obtained using Eq. (1.9). Otherwise, we have to check whether D has zero 
modes with negative chirality by projecting the largest eigenmodes of S_. If D has zero modes 
with negative chirality, the largest eigenvalues of S_ are equal to +1, and the corresponding full 



eigenvectors can be obtained using Eq. ( |1.9| ). Finally, for non-zero low-lying eigenmodes, we can 



pick either + or — chirality of Eq. ( |1.8| ) for projection, then obtain the full eigenvectors using Eq. 

dLl). 



2. The TRLan algorithm 

In general, the low-mode projection of a large sparse matrix A can be carried out via iterative 
process, such as the Lanczos algorithm or the Arnoldi algorithm. The common procedure is to 
construct an orthonormal basis from the Krylov subspace starting from an initial vector ro, 

^(A,ro) = span {ro, Aro, A^ro, . . . , A'"-Vo} . (2.1) 

The linear combinations of {AVq,/ = 0, 1, ... m — 1} form the Ritz vectors of A, which converge to 
the eigenvectors of A as m becomes very large. 

Since and S± are Hermitian matrices, it is natural to use the Lanczos algorithm. The basic 
Lanczos process (see Figjl]-(a)) is to construct the following factorization 

^Qm = QmTm + l^mlm+l^mj (2-2) 

where {q\, q2, . .. , q,n} form an orthonormal basis, Q,„ = [qi,q2, ■ ■ ■ ,qm], is the m-th column of 
the mxm identity matrix, and is a tridiagonal matrix with diagonal elements a, and sub-diagonal 
elements j8,-. Then the Ritz pairs (A,-,x,) can be obtained from 

Pm — UfjjTffiU,n, X,n — QirJJmi (2.3) 
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Input: ro, j8o = ||ro||, <?o = 
For: /= 1,2, ... 

• qi = n-i/A-i 

• p=Aqi 

• ai = q]p 

• A- = lk/|| 



(a) (b) 
Figure 1: (a) The basic Lanczos process; (b) The Lanczos algorithm with Thick-Restart. 
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: k' = 240, 


m = 340 




S+: k' = 


100, m = 


200, np = 


240 


method 


N_restarts 


#of A-v 


time(s) 


speed up 


N_restarts 


# of A-v 


time(s) 


speed up 


ARPACK 


388 


35390 


136460 


1.00 


13 


1050 


112632 


1.00 


TRLan 


999 


100140 


572951 


0.24 


12 


1300 


105790 


1.06 


v-TRLan 


383 


59145 


78058 


1.75 


11 


1030 


90496 


1.24 



Table 1: The benchmark of low-mode projection of and 5+ for a non-trivial gauge configuration (2r = 3) 
at j3 = 5.9 and niga = 0.01. Here k' is the number of low-modes projected, m is the dimension of Krylov 
subspace, and rip is the number of low-modes of used for low-mode preconditioning in the projection of 
the low-modes of S+. 




where f,„ is diagonal with eigenvalues A,-, Um is a unitary matrix, and Xm= [xi,X2,. ■ ■ ,x,„]. As m 
becomes very large, the Ritz pairs converge to the eigenmodes of A. 

However, the Lanczos algorithm suffers from the following problems. Some specious Ritz 
values may repeatedly appear when m goes to larger values. This is due to the fact that the vectors 
{qi} lose orthogonality rapidly in the finite precision arithmatics. Hence one has to apply the 
Gram-Schmidt procedure to re-orthogonalize them during the iteration. Also, it requires a large 
number of qj to project a few Ritz pairs, consuming a large amount of memory and slowing down 
the computation. A way to overcome these problems is to perform a restart. 

There are several restart schemes, in which the Thick-Restart Lanczos algorithm (TRLan) 
turns out to be the most efficient for our purpose. The TRLan algorithm is illustrated in Fig|l]-(b), 
where the squares in the diagram stand for the square matrix T,„, in which the non-zero matrix 
elements are denoted by black lines. 

Obviously, the performance of the TRLan algorithem depends on the truncated dimension 
k and the dimension m of the Krylov subspace. Now the question is what is the optimal value 
of k (for a given m) such that for each restart, the reduction factor dj of the residual of the 7-th 
(not converged) Ritz pair is maximized, while the number of floating-point operations (FPOs) is 
minimized. In other words, at each restart, we try to find the optimal k by maximizing the object 
function [ p^ 

^W^iJpoi' ".^^-C+^r), (24, 
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where is the Chebyshev polynomial of degree m — k. For the number of FPOs, we 

only count the dominant parts which are directly related to the dimension m, namely, the re- 
orthogonalization and the update of Ritz vectors. This modified algorithm is called the Adaptive 
Thick-Restart Lanczos algorithm, which can attain substantial performance gain, with faster con- 
vergence and fewer FPOs. 
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Figure 2: The number of converged Ritz pairs and the dimension k versus the number of restarts for the 
V-TRLan algorithm: (a) H„ [m = 340); (b) 5+ of the overlap operator (w = 240). 



For benchmarking, we project the low-lying eigenmodes of H^^ and 5+, using a nontrivial con- 
figuration {Qt = 3) generated by the 2-flavor QCD simulation on a 16^ X 32 lattice, with ODWF 
fermion at Ng = 16 and mie„a = 0.01, and the plaquette gauge action at j8 = 5.90. In Table |l|, we 
present our benchmark results for three different methods: ARPACK (Arnoldi algorithm with Im- 
plicit Restart), TRLan (Thick-Restart Lanczos), and v-TRLan (Adaptive Thick-Restart Lanczos). 
For TRLan (by definition k = k', where k' is the number of low-modes we want to project, and 
k is the truncated dimension at each restart), we see that its rate of convergence tends to be very 
slow for Hw, with a rather large condition number (r^ 10^). On the other hand, for v-TRLan, k is 
automatically adjusted to be an optimal value in each restart {k' < k < m, see Fig. ^, thus the rate 
of convergence is dramatically improved. It turns out that, for both and S±, v-TRLan is faster 
than TRLan and ARPACK. Therefore, we use v-TRLan to project the low-lying eigenmodes of H„ 
andD. 



3. Lattice setup 



Simulations are carried out for two flavors {Nf = 2) QCD on a 16^ x 32 lattice at a lattice 



spacing a ~ 0.11 fm, for eight sea quark masses in the range m^a = 0.01,0.02, • • • ,0.08 [|11|]. For 
the gluon part, the plaquette action is used at j3 = 5.90. For the quark part, the optimal domain-wall 
fermion is used with A/^ = 16. After discarding 300 trajectories for thermalization, we accumulated 
3000 — 3200 trajectories in total for each sea quark mass. From the saturation of the binning error 
of the plaquette, as well as the evolution of the topological charge, we estimate the autocorrelation 
time to be around 10 trajectories. Thus we sample one configuration every 10 trajectories, then we 
have 270 — 290 configurations for each sea quark mass. 
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For each configuration, we calculate the zero modes and 80 conjugate pairs of the lowest-lying 
eignmodes of the overlap Dirac operator. We outline our procedures as follows. First, we project 
240 low-lying eigenmodes of using v-TRLan alogorithm, where each eigenmode has a residual 
less than lO^^^, Then we approximate the sign function of the overlap operator by the Zolotarev 
optimal rational approximation with 64 poles, where the coefficents are fixed with A^^,^ = (6.4)2, 
and A 2^-^ equal to the maximum of the 240 projected eigenvalues of H^. Then the sign function 
error is less than lO^^^. Using the 240 low-modes of and the Zolotarev approximation with 
64 poles, we project the zero modes plus 80 conjugate pairs of of the lowest-lying eignmodes of 
the overlap operator with the v-TRLan algorithm, where each eigenmode has a residual less than 

10-12. 



4. Results 

In Fig. |3[ we plot the topological susceptibility Xt = {{Qt — {Qi)Y) / ^ as a function of the 
sea quark mass niq. For all eight sea quark masses, our data are well fitted by a linear function 
F + GiUq with the intercepts =0.47(2.37) x 10-^ and the slope G = 5.50(4) x 10-^. Evidently, the 
intercept is consistent with zero, in agreement with the ;^PT expectation ( |1.3[ ). Equating the slope 
to ^/Nf, we obtain £ = 0.01 10(7). In order to convert £ to that in the MS scheme, we calculate the 
renormalization factor Z^^{2 GeV) using the non-perturbative renormalization technique through 
the RUMOM scheme |JT|], and obtain Z^(2 GeV) = 0.5934(10) Then the value of Z is 

transcribed to 

1^(2 GeV) = [261(5)(7) MeV]^ (4.1) 

which is in agreement with our previous results : l'^^(2 GeV) = [245(5)(12) MeV]^ for Nf = 2 
lattice QCD with overlap fermion in a fixed topology [|], and L^{2 GeV) = [253(4) (6) MeV]^ for 
Nf = 2 + \ lattice QCD with domain- wall fermion [||]. The statistical error represents a combined 
error (including those due to a^^ and Z^^). The systematic error is estimated from the turncation 
of higher order effects and the uncertainty in the determination of lattice spacing with ro = 0.49 
fm. Since our calculation is done at a single lattice spacing, the discretization error cannot be 
quantified reliably, but we do not expect much larger error because our lattice action is free from 
0(a) discretization effects. 



5. Concluding remarks 

We have measured the topological charge of the gauge configurations generated by lattice 
simulations of 2 flavors QCD on a 16^ x 32 lattice, with Optimal Domain- Wall Fermion (ODWF) 
at Ns = 16 and plaquette gauge action at j8 = 5.90. We use the Adaptive Thick-Restart Lanczos 
algorithm to compute the low-lying eigenmodes of H^, and the overlap Dirac operator. Our result of 
Xt agrees with the sea-quark mass dependence predicted by the ChPT at the tree level, and provides 
a good determination of the chiral condensate. Furthermore, our result of Xt also agrees with the 



the NLO ChPT, Eq. (1.4). We will present the details in a forthcoming paper [14]. 
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Figure 3: The topological susceptibility of two-flavors QCD with the optimal domain-wall fermion. 
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